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M-ESTIMATION OF LINEAR MODELS WITH DEPENDENT 

ERRORS 1 
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University of Chicago 

We study asymptotic properties of M-estimates of regression pa- 
rameters in linear models in which errors are dependent. Weak and 
strong Bahadur representations of the Af -estimates are derived and 
a central limit theorem is established. The results are applied to lin- 
ear models with errors being short-range dependent linear processes, 
heavy-tailed linear processes and some widely used nonlinear time 
series. 

1. Introduction. Consider the linear model 
(1) yi = x'if3 + ei, l<i<n, 



where /3 is a p x 1 unknown regression coefficient vector, Xj = {xn, . . . ,Xip)' 
are p x 1 known (nonstochastic) design vectors and et are errors. We estimate 
the unknown parameter vector (3 by minimizing 

n 

(2) ^p(y.-x:/3), 

1=1 

where p is a convex function. Important examples include Huber's esti- 
mate with p{x) = (x^l|a.|<c)/2 + (c|x| — c^/2)l|j.|>c; c > 0, the regres- 
sion estimate with p[x) = \x\'^, 1 < <? < 2, and regression quantiles with 
p{x) = Pa{x) = ax~^ -|- (1 — a){—x)~^, < a < 1, where x~^ = max(a;,0). In 
particular, if g = 1 or a = 1/2, then the minimizer of (2) is called the least 
absolute deviation (LAD) estimate. See [2] and [68] for C regression esti- 
mates and [35] for regression quantiles. See also [34] for an excellent account 
of quantile regression. 
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Let $n be the minimizer of (2) and let /3o be the true parameter. There is 
a substantial amount of work concerning asymptotic properties of /3„ — Pq 
for various forms of p (not necessarily convex); see, for example, [2, 3, 4, 7, 8, 
10, 12, 26, 30, 32, 49, 57, 66, 67] and [69] among others. Deep results such as 
Bahadur representations have also been obtained. However, in the majority 
of the previous work it is assumed that the errors Cj are independent. The 
asymptotic problem of M-estimation of linear models with dependent errors 
is practically important, however theoretically challenging. Huber [29, 30] 
commented that the assumption of independence is a serious restriction. See 
also [24]. 

In this paper we shall relax the independence assumption in the classi- 
cal M-estimation theory so that a very general class of dependent errors 
is allowed. Specifically, we shall establish a Bahadur representation and a 
central limit theorem for /3„ — (3q for the linear model (1) with the errors 
(ej) being stationary causal processes. In the early literature very restrictive 
assumptions were imposed on the error process (ej). Typical examples are 
strongly mixing processes of various types. See [13, 36] and [41] among others 
for strong mixing processes and [45] for (^-mixing processes. Berlinet, Liese 
and Vajda [9] obtained consistency of M-estimators for regression models 
with strong mixing errors. Gastwirth and Rubin [20] considered the behavior 
of L-estimators of strong mixing Gaussian processes and first-order autore- 
gressive processes with double exponential marginals. It is generally not easy 
to verify strong mixing conditions. For example, for linear processes to be 
strong mixing, very restrictive conditions are needed on the decay rate of 
the coefficients [17, 22, 40, 60]. Portnoy [43, 44] and Lee and Martin [38] 
investigated the effect of dependence on robust location estimators by as- 
suming that the errors are autoregressive moving average processes with 
finite orders. 

To the best of our knowledge, it seems that the problem of Bahadur rep- 
resentations has been rarely studied for M-estimates of linear models with 
nonstrong mixing errors. The Bahadur-type representations provide signifi- 
cant insight into the asymptotic behavior of an estimator by approximating 
it by a linear form. For sample quantiles it has been investigated by Hesse 
[27], Babu and Singh [5] and Wu [62], among others. Babu [4] considered 
LAD estimators for linear models with strong mixing errors. 

For the errors (e^) we confine ourselves to stationary causal processes. 
Namely, let 



where e^, k £ Z, are independent and identically distributed (i.i.d.) ran- 
dom variables and G is a measurable function such that Cj is a proper ran- 
dom variable. Here Z is the set of integers. The framework (3) is a natural 



(3) 
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paradigm for nonlinear time series models and it represents a huge class of 
stationary processes which appear frequently in practice. As in [46, 53, 59] 
and [63], (3) can be interpreted as a physical system with the innovations 
Ei being the inputs that drive the system, G being a filter and Cj being the 
output. This interpretation leads to our dependence measures. The Wiener 
conjecture states that every stationary and ergodic process (cj) can be ex- 
pressed in the form of (3); see [33, 50, 51] and [55], page 204. 

Let the shift process !Fk = {■ ■ ■ i^fc-ij^fc)- For i G N let Fi{u\!FQ) = P(e.j < 
u\J^q) [resp., fi{u\J^Q)\ be the conditional distribution (resp., density) func- 
tion of Bi at u given and let / be the marginal density of e^. Let I > 0. 
For a function 5, write g gC'' if g has derivatives up to /th-order and g^^^ is 
continuous. Denote by f-''\u\To) = d'' fi{u\To) / du'' the /th-order derivative 
if it exists. Let (e^ be an i.i.d. copy of (ej), J^^ = {. . . ,e_i,eo,ei, . . . ,£k) and 
el = G{T^). Then JP"^ is a coupled version of Tk with eq replaced by e'q, J^* 
= J^j, j < 0, and Cfc and are identically distributed. Our short-range de- 
pendence (SRD) conditions suggest that a certain distance between the two 
predictive distributions [ej].Fo] and [e*|.Fo] is summable over i > 1. Since 
those conditions are directly related to the data-generating mechanism of 
(cj), they are often easily verified; see applications in Section 3. 

The paper is structured as follows. Section 2 presents our main results 
on Bahadur representations and central limit theorems for /3„ — /3o • Section 
3 contains applications to linear models with errors being short-range de- 
pendent linear processes, heavy-tailed linear processes where M-estimation 
is particularly relevant, and some widely used nonlinear time series. Proofs 
are given in Section 4. 

2. Main results. Without loss of generality, assume throughout the pa- 
per that the true parameter (3q = 0. We first introduce some notation. Let 
\a] = min{A; G Z : A; > a} and [aj = max{/c S Z : A; < a}, a G M, be the usual 
ceiling and floor functions. For a p-dimensional vector v = (vi, . . . ,Vp) let 
jv] = (LLi '^i)^^^- A random vector V is said to be in C^, g > 0, if Edy]^) < 
00. In this case write \\V\\q = [EdF]'?)]^/'? and \\V\\ = \\V\\2. Let the co- 
variance matrix of a p-dimensional column random vector V be var(y) = 
E{VV') - E{V)E{V'). Define projection operators Vk, keZ, by VkV = 
K{V\J^k) — ^{V\^k-i), y G The symbol C denotes a generic constant 
which may vary from place to place. For a sequence of random variables 
(??„) and a positive sequence (dn), write rjn = Oa.s. (dn) if Vn/dn converges to 
almost surely and rjn = Oa.s. (dn) i^Vn/dn is almost surely bounded. We can 
similarly define the relations op and Op. Let N{fi,Ti) denote a multivariate 
normal distribution with mean vector fi and covariance matrix S. 

Let the model matrix X„ = (xi, . . . ,x„)' and S„ = X^X„. Assume that 
T,n is nonsingular for large n. It is convenient to consider the rescaled model 

(4) yi = ZiJ + ei, 
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where Zj^„ = Sn Xj and 9 = 9n = /?• Studying the asymptotic behavior 

1/2 ' 

of (5n is equivalent to studying that of On = 5]„ /?.„, which is a minimizer of 
X^iLi ~ ^i,n^)- If there are multiple minimizers, we just choose any such 
minimizer. Observe that Yll=i '^i.n^i n ~ -'■'^P' p x p identity matrix. For 
q>0 define 

n n 

(5) Cn{q) =^\zi,n\'' and = ^ |xi|''. 

1=1 1=1 

Assume that p has derivative ip. Define the fcth-step-ahead predicted func- 
tion 

(6) Vfc(t;^o)=IE[^(efc+t)|.Fo], k>0. 

The function ipki'', ■) plays an important role in the study of the asymptotic 
behavior of We now list some regularity conditions on p, Xj and the 
errors e^: 

(Al) /9 is a convex function, E[V'(ei)] = and HV'lei)!!^ > 0. 
(A2) ip{t) := E['0(ei + 1)] has a strictly positive derivative at i = 0. 
(A3) 'm{t) := \\ip{ei + 1) — ip{ei)\\ is continuous at i = 0. 
(A4) r„ := maxj<„ |zj,„| = maxj<„(x-S;;;^Xi)^/2 = o(l). 

Conditions (Al)-(A4) are standard and they are often imposed in the 
M-estimation theory of linear models with independent errors; see, for ex- 
ample, [7]. In (Al), the error process (cj) itself is allowed to have infi- 
nite variance, which is actually one of the primary reasons for robust es- 
timation. Section 3 contains an application to linear models with depen- 
dent heavy-tailed errors. Under (A2), 9 is estimable or separable. Condi- 
tion (A3) is very mild. Note that ip is nondecreasing and it has countably 
many discontinuity points. If Cj has a continuous distribution function and 
||''/'(ei+to)|| + ||V'(ei -to)\\ < oo for some to > 0, then limt^o'4>{ei+t) = ■0(ei) 
almost surely and (A3) follows from the Lebesgue dominated convergence 
theorem. 

The uniform asymptotic negligibility condition (A4) is basically the 
Lindeberg-Feller-type condition. With (A4), the diagonal elements of the 
hat matrix X„S~^X^ are uniformly negligible. Let Xj^,...,Xjp be linearly 
independent, 1 < ii < ■ ■ ■ < ip, and Q = (xj^ , ■ • ■ , Xj^). Then Q is nonsingular, 
Q'Ti~^Q — > and consequently S"^ — > 0. The latter implies that the mini- 
mum eigenvalue of S„ diverges to oo and it is a classical condition for weak 
consistency of the least squares estimators [18]. For the regression model 
(1) with i.i.d. errors Cj having mean and finite variance, (A4) is neces- 
sary and sufficient for the least squares estimator S~^X^(yi, . . . ,yn)' to be 
asymptotically normal (see [30], Section 7.2, and [21]). 
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Besides the classical conditions (Al)-(A4), to obtain asymptotic prop- 
erties of (3n and 9n we certainly need appropriate dependence conditions 
[cf. (7) and (14)]. They are expressed in terms of V'fcls-^o)- Recall = 
{...,e-i,e'o,ei,...,ek) and el = G{J='^). 

2.1. Asymptotic normality. Theorem 1 asserts that 9n can be approxi- 
mated by the linear form Tn = ^22=1 '^{^i)'^i,n with an op(l) error. Due to the 
linearity it is easier to deal with T„, which is asymptotically normal under 
proper conditions (cf. Lemma 2). 

Theorem 1. Assume (Al)-(A4) and, for some eq > 0, 

oo 

(7) sup ||E[V^(e. + e)|.Fo]-E[^(e* + e)|.Fo*]||<oo. 
Then we have 

n 

(8) ^'iO)9n-Y.^iei)zi^n = opil) 

i=l 

and On = Op(l). Additionally, if the limit 

n—\k\ 

(9) ^lirn^ T.i^n'z'i+k^n = 

i=l 

exists for each A; G Z, then 

(10) (^'(0)^„^iV(o,A), ?«/iere A = ^E[V(eo)V(efc)]Afc. 

Theorem 1 ensures the consistency of 6n = Op(l) and Yl^^ im- 
plies that /3n = op(l). It is generally not trivial to establish the consistency 
of M-estimators. The convexity condition is quite useful in proving consis- 
tency; see [7, 23, 39] among others for regression models with independent 
errors. Recently, Berlinet, Liese and Vajda [9] considered consistency of M- 
estimates in regression models with strong mixing errors. This paper requires 
that the regressors Xj satisfy the condition that ^27=1 converges to 
some probability measure, where 5 is the Dirac measure. This condition 
seems restrictive and it excludes some interesting cases (cf. Remark 1). For 
linear models with stationary causal errors, it is unclear how to establish 
the consistency and asymptotic normality without the convexity of p. 

We now discuss condition (7). Since ^pi{€^,To) = K[ijj{ei + e)|^o] is the ith- 
step-ahead predicted mean, the quantity \\ipi{e; J-'o) — 'tpi{e; Tq)\\ = \\K[ip{ei + 
€)\J'o] — E,[ijj{e* + €)\Tq]\\ measures the contribution of in predicting ■(/'(cj -|- e) 
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Hence (7) suggests short-range dependence in the sense that the cumula- 
tive contribution of Eq in predicting future values is finite. The following 
proposition provides a sufficient condition for (7). Recall that Fj(-|^o) is the 
conditional (or predictive) distribution function of ej given J^q and fi{-\J^o) 
is the conditional density. Let iIj{u;€o) = \^p{u + eo) \ + \^p{u — eo)|. 

Proposition 1. Condition (7) holds under either (i) 

CO „ 

(11) y2o{i)<oo, whereui{i)= Wfiiul^o) - fi{u\J='Q)\\i;{u;eo) du, 
i=i ■'^ 

or (ii) p{x) = Pa{x) = ax'^ + (1 — a)(— x)"*", < a < 1, and 

oo 

(12) 'S^uj{i)<oo, where 00 {i) = sxvi>\\Fi{u\TQ) — Fi{u\TQ)\\. 
i=i «eR 

Proposition 1 easily follows from the identities E(le.<ii|^o) = -^i(^^l-^o) 
and IE['(/'(ej + e)|JPo] = ^^il){u + e) fi{u\TQ) du. We omit the details. 

In Proposition 1, case (ii) corresponds to quantile regression, an impor- 
tant non-least squares procedure. Condition (12) can be interpreted as fol- 
lows. If the conditional distribution [ei|^o] does not depend on eq, then 
Fi{u\J^o) - -Fj(til-Fo) = 0- The quantity sup„ \\Fi{u\J^o) - Fi{u\r^)\\ can thus 
be interpreted as the contribution of eo in predicting e^. In other words, 
sup„ ||Fj(u|JPo) — Fj(ii|.7^Q)|| quantifies the degree of dependence of the pre- 
dictive distribution [ei|.Fo] on eo- So (12) suggests that the cumulative con- 
tribution of eo ill predicting future values (ej)j>i is finite. Condition (11) 
delivers a similar message by incorporating the information of the target 
function ip = p' as weights into the distance between the two predictive dis- 
tributions [ej|.7^o] and [e*|^o]. To obtain Bahadur representations, stronger 
versions of (11) and (12) are needed; see (27) and (28). 

Remark 1. Many of the earlier results require that Xj, 1 <i <n, satisfy 
the condition that S„/n converges to a positive definite matrix ([8, 32] 
among others). This condition is not required in our setting. Consider the 
polynomial regression with design vectors Xj = (1, i, . . . , i^"^)', 1 < i < n. 
Then T,n/n does not have a limit. Elementary but tedious calculations show 
that (A4) is satisfied and (9) holds with = Idp. However, a condition of 
such type is needed in deriving strong Bahadur representations; see Theorem 
3. 

Remark 2. In the expression of A in (10), the presence of the terms 
lE[V'(eo)V'(efc)]Afc, A; 7^ 0, is due to the dependence of (e^). 
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Remark 3. To apply Theorem 1 to quantile regression with p{x) = 
Paix), since 'ip{x) = a — lx<o and (p{x) = a — F{—x), we need to ensure that 
Cj has a density at 0; see condition (A2). This problem is generally not easy. 
A simple sufficient condition is that the conditional density fi{-\J-o) exists. 
Without conditions of such type, the existence of marginal densities is not 
guaranteed. For example, consider the process et = J2'i^o'^^t-i/y~^^, where 
Et are i.i.d. and P(et = 1) = P(ef = —1) = 1/2. Then the conditional density 
does not exist and the marginal distribution does not have a density either. 
Solomyak [52] considered the absolute continuity of J2i^o^^^t-i, t~ G (0, 1). 

2.2. Bahadur representations. Bahadur representations with appropri- 
ate rates are useful in the study of the asymptotic behavior of statistical 
estimators. For M-estimation under independent errors, various Bahadur 
representations have been derived; see, for example, [2, 4, 11, 26] and [47] 
among others. In particular. He and Shao [26] obtained a sharp almost sure 
bound under very general conditions on p. To obtain approximation rates 
for M-estimates of linear models with dependent errors, we need extra con- 
ditions on the behavior of the function '0i(s;.Fo) at the neighborhood of 
s = 0: 

(A5) There exists an eg > such that 

(13) U:= sup \M2lIp.jtl^ ^ c\ 

(A6) Let <^C\l> 0. For some eo > 0, sup|,|<,Q [[^'^(e; J^^ll < 

and 

oo 

(14) sup||E[V'P(6;.Fi)|.^o]-K[vS'^(e;.^;)|.^o*]ll<oo- 

j=0 l<^l<<^o 

Condition (A5) suggests that the function 7pi{s;J^i), \s\ < eo, is stochasti- 
cally Lipschitz continuous at a neighborhood of 0. The function ijj itself does 
not have to be Lipschitz continuous. Indeed, for ip{x) = p'^ix) = a — lx<0) 
if the conditional density /i(-|^o) is bounded, then (13) holds. For this 
we need to assume that the conditional density exists to ensure that 
Cj has a density, which is a prerequisite for Bahadur representations for 
quantile estimates (cf. Remark 3). If S C'"*"^ satisfies sup„ |V'^^H'u)| < oo, 
1 < /c < / -|- 1, then (13) holds and a sufficient condition for (14) is that 
Si^o jjci — e*|| < oo. In this case the existence of a conditional density is not 
required. Since E[?/;(ei — s)|.^o] = /r''/'(^)/i(^ + s\!FQ)dv, by Fubini's theo- 
rem, a sufficient condition for (13) is |/((u|.Fo)|'i/'(^i, eo) du £ C^. The latter 
holds if /]g \\f{{u\J^o)\\ip{u,eo)du < oo. The last condition (A6) is a general- 
ization of (7). Section 2.3 gives sufficient conditions for (14). 
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Define M-processes = - E[17„(0)] and k{P) = QniP) - 

EpniP)], where 

n n 

^n{0) = ^il^{ei - •z.[ ,^9)z.i^n and = ^ ^/;(ej - x-/3)xi, 

^ ^ i = l 2 = 1 

(15) 

6,13 £W. 

The M-process itself is an interesting subject of study and it plays an im- 
portant role in M-estimation theory. Welsh [58] considered M-processes for 
linear models with i.i.d. errors. Theorems 2 and 3 present local oscillation 
rates for the M-processes Kn and Kn ■ Corollary 1 provides a weak Bahadur 
representation for On- Theorem 3 deals with K and gives a strong Bahadur 
representation for 

Theorem 2. Assume (Al)-(A5) and assume (A6) holds with I = 0, . . . ,p. 
Let (5n)neN be a sequence of positive numbers such that 

(16) 5„ — > oo and ^nVn = ^nUiax |zj^„| — > 0. 

i<n ' 

Then 



(17) sup \Knie)-Knm=OMrn{6n)logn + 6nJCnm, 
\9\<S„ 

where 

n 

(18) Tn{6)=Y,\zi,n\^[mWzi^n\S)+m\-\z,^nm, S > 0. 

1=1 

Corollary 1. Assume (Al)-(A5) and assume (A6) holds with I 
0, . . . ,p, and ip{t) = t(p'{0) + 0{t'^) as t —>^0. Further assume ^{9n) = Op(r„ 
Then for any sequence c„ — > oo, 



(p'iO)On - ■il){ei)zi^n = 0^[J Tn{5n) logn + (5„r„] 



4=1 



(19) 

where 6n = min(cn, rn ^^^) 
In particular, if as t ^ 0, m{t) = 0{\t\''^) for some A > 0, then 



(20) ip'iO)en - V(ei)zi,n = Op[ JCn(2 + 2A) logn + r„] 



i=l 



Remark 4. li ip is continuous, it is easily seen that the minimizer On 
solves the equation 0,{6n) = 0. In the case that ■0 is discontinuous, the latter 
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equation may not have a solution. To overcome this difficulty, in Corollary 
1 we propose the approximate equation i^{6n) = Op(r„). An important ex- 
ample for discontinuous ip arises in quantile regression. Let tp{x) = Pa{x) = 
a — lx<o, < a < 1. The argument in Corollary 2 and Lemma 9 implies that 
the minimizer On satisfies < (p + l)?^n almost surely. 

Theorem 3. (a) Assume (Al)-(A3), (A5) and assume (A6) holds with 
I = 0, . . . ,p. (b) Let Xn be the minimum eigenvalue of S„,. Assume that 

liminfA„/n>0, ^J2) = 0(n) 

n — >oo 

and 

(21) Vn '■= max|xj| = 0[n^/^(logn)~^]. 

Let bn = n-^/'^{\ognf/^{\og\ognY/'^+', i>Q, n = 2ri°g"/iog2l ^ > 3/2. 
Then (i) 

(22) sup \knm-knm=0^.s.{Ln + Bn), 

|/3|<fe„ 

where S„ = 6„vC(4)(logn)3/2(loglogn)(i+^)/2, Ln = ,/fj2bn){\ognY and 

n 

(23) f„(5)=^|xi|2[m2(|x,|<5)+m2(-|x,|5)], 5 > 0. 

1=1 

If additionally ip{t) = tip'{0) + O{t'^) and m{t) = 0{^/i) as t ^ and Qn{Pn) = 
Oa.s.(^n); then (ii) /3„ = Oa.s.(&n) CLiT'd (iii) the strong Bahadur representation 
holds: 

n 

(24) /(0)S„/3, - J2 V'(e*)x, = 0^.s.{Ln + Bn + U3)bl + f„). 

1=1 

Corollary 2. Assume that is the ath quantile of ei, f{0) > and 
there is a constant Cq <oo such that sup„ fi{u\To) < Co, f , sup^gjg ||F^^'^(u| 
J^i)\\ < 00 and 

00 

(25) 5]sup||E[Fi(')(^z|J-0|.Fo]-IE[i^f^(n|.^;)|^o1ll <oo, 1 = 0,. ..,p. 

Assume that Xj satisfies conditions (b) in Theorem 3. Let j3n be a mini- 
mizer of (2) with Pa- Then (i) f3n = Oa.s.(^n) o.nd (ii) the strong Bahadur 
representation holds: 

n 

(26) /(0)S-i/3„ - ^(a - le.<o)x^ = 0^.,.{Bn + Ln + Cn(3)6^ + f„), 



j=l 



where B„ = [bnil/\A)]{\ognfl\\og\ognY/^ and L„ = [6„^„(3)]V2(logn)5. 
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We now discuss the bound in (24). Clearly the condition i^n(2) = 0{n) im- 
plies that f„ = maxj<„ |xj| = 0(n^/^). The condition on f„ in (21) is not the 
weakest possible. For presentational clarity we adopt (21) since otherwise it 
involves quite tedious manipulations. If "0 is continuous, then QniPn) = 0. 
If additionally ^„(4) = 0(n) and m{t) = 0{\t\^), 1/2 < A < 1, the bound in 
(24) is 0[?i(^~^)/^(logn)'? ], q' > 3. For the bound in (26), elementary calcula- 
tions show that, if ^n(«^) = 0{n), 2 < k < 4, then the bound is Oa.s. ('t-^'^'^); if 
Cn(4) = 0{n), then the bound becomes 0[7T,^/'^(logn)'^ q' > 9/4. The latter 
bound is optimal up to a multiplicative logarithmic factor since the classical 
[6] representation has the bound Oa.s.[?^~^^^(loglogn)^/^]. 

If T,n/n converges to a positive definite matrix Q (say), then ^n(2) = 
0{n) and the limit of A^/n is the smallest eigenvalue of Q, which is strictly 
positive. To apply Theorems 2 and 3 and Corollary 1, we also need to verify 
if{t) = tip'{0) + 0{t^) and know the order of magnitude of m(-); see the 
definitions of Tn(-) and fn(-) by (18) and (23). Examples 1 and 2 below 
concern some commonly used p. Recall that / is the density of e^. 



Example 1. Assume (16). If ip has a derivative ip' satisfying sup|„|<5 
||V''(ei + u)|| < c« for some (5 > 0, then m{t) = 0(|t|) as t — > and Tn{5n) = 
0[Cn(4)5^]. The latter claim easily follows from ^{ei +t) — V'(ei) = /g ^''(^1 + 
u) du and m?{t) <t Jq {ei-'ru)\\'^ du = 0{t'^) . An important example is Ru- 
ber's function p{x) = (x^l|^|<c)/2 + {c\x\ — (? /2)1\^\^^, c> 0. Then ^(x) = 
max[min(x, c),-c] and Lp'{t)= P(|ei + t\ < c) = F{c - t) - F{-c - t). If 
sup^, f{x) < oo, then ip{t) = tip'{0) + 0(t^) and m{t) = 0{\t\) as i ^ 0. 

Example 2 (£'^-regression estimates). Assume (16). Let p{t) = \t\'^, 1 < 
g < 2, and assume sup^,/(u) < oo. If g / 3/2, then m{t) = 0(|t|^/^) and 
Tn{6n)=0{Cn{2 + q')6^], where q' = mm{2,2q - 1). If g = 3/2, then m{t) = 
0[|t|log(l/|t|)] and Tn{5n)=j:i=iKn\H'^og\z,,n\?0{Sl). Here the bound 
of m(t) follows from [2]. The bound of Tn{6n) when q ^ 3/2 can easily be 
obtained. If g = 3/2, since Tfibfi — > 0, then | log |z2^7^,(572 II < 2| log |zj^„|| for suf- 
ficiently large n and the stated bound for rn((5n) follows. 

If sup^.[/(x) + |/'(x)|] < oo and eo G then Lpii) = V(0) + 0{t^). To 

this end note that ^/'(x) = (7|x|'^~"^sgn(x) and ^'(x) = q((i — l)|x|'^~^sgn(a:;). 
Let \b\ < 1 and L> = V'(ei + -5) - V'(ei). If |ei| > 3, then \D\ < \6\. On the 
other hand, 

r3 



E(m|,^l<3)= / ^|;'{u)[f{u)-f{u-6)]du 

J — 3 



+ 



-3+5 /-a+^-i 

ip'{u)f{u — 5) du, 
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which is also of the order 0{6) since sup^ I/' (2;) I < 00 and J"^.^ \ip' {u) \ du < 00 . 

Therefore E[/o V'(ei + S) - V''(ei) dS] = 0{t^), which imphes if{t) - v?(0) = 
V(0) + O(t2). 

2.3. Sufficient conditions for (14). Recah the projections Vk- = — 
E(-|.F/c_i) and = {. . . , eg, ei, . . . , e^). Proposition 2 provides sufficient 
conditions for (14) and (25). These sufficient conditions appear easy to work 
with; see apphcations in Section 3. Lemma 1 follows from Theorem 1 in [63]. 

Lemma 1. Assume that the process Xt = g{J't) G ■ Let g-ni^o) = '^[di^n] 
J^o], n > 0. Then WVoXj < \\g{J^n) - 5(-^n)ll and ||Po^n|| < ||5n(^o) - 

5n(^0*)ll<2r0^n||. 

Proposition 2. (i) Assume that /i(-|J^i) G C', l> 0, and 



(27) 



'^uji{i) < 00, 
1=0 



whereOi{i^ = I \\fi'\u\Ti)-f^''{u\T:)\mu;eo)du. 



(0/ 



Then E»=oSup|,|<eollV'?(e;^*)-V'P(e;^*)|| <oo and (14) holds. 

(ii) Let p{x) = Pa{x) = ax'^ + (1 — ot){—x)^, < a < 1. Then (25) holds 
if for <l <p 



(28) ^a;;(i)<oo, where u}i{i) = su^WF^ (ul^i) - fI''\u\T*)\\. 



n(0/ 



Proof, (i) Since ipi{t]J^i) = J^'>P{v)fi{v — t\J^i)dv, by Lemma 1, for 
1*1 < eo, 



E[i^f\t;J^i)\J^o]-E[^lj'(>{t;J^*)\J^* 



(0, 



<2||V'P(t;^,)-^r^(t;J-; 



(0, 



^(^;)[/f (t;-t|^,)-/r^(^;-t|j;*)]dt; 



(0, 



<2 / |V^(t;)|||/f)(z;-i|.F,)-/r(^^-i|-^*)l|rf^^<2(I'/(i) 

So (i) holds. 

(ii) This easily follows from Lemma 1. □ 



(0/ 
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3. Applications. This section contains applications of results in Section 
2 to linear models with errors being linear processes and some widely used 
nonlinear time series. For such processes the SRD conditions (27) and (28) 
can be verified. 

3.1. Linear processes. Let Si be i.i.d. random variables and Oj real num- 
bers such that 

oo 

(29) ei = Y^ a-jSi-j 

j=0 

exists. Without loss of generality let ao = 1. Let be the distribution func- 
tion of eo and let fe be its density. Propositions 3 and 4 provide simple suf- 
ficient conditions for (28) and (27), respectively. For 7 G M let the weighted 
measure w^{du) = (1 -|- \u\)'^ du. The proof of Proposition 4 is given in [64]. 

Proposition 3. Assume thateo G q>0, and that for some Co < 00, 

(30) sup|/«(n)| <Co, 1 = 0,..., p. 

u 

Then iJi{i) = 0{\ai\''' /'^), q' = inm{2,q). Consequently (28) holds if 

00 

(31) ^|ojf'/2<oo. 

j=o 

Proof. Let = Z^j^i (^j^n-j and Z* = Zn- a„eo + «neo- Then Fi{u\ 
jr„_i) = Fs{u - Zn). By (30), since min(l, |xp) < < 5 < 2, 

u^iin) = sup||F(')(n - Z„) - FPiu - Z:)\\ 

u 

(32) < ||min(2Co,Co|a„,eo - 1) II 
<2Co|||a„eo-aneor'/'ll=0(|a,r'/'). 

The second assertion is obvious. □ 

Proposition 4. Let 1 < 7 < i?. AssumeK{eo) = 0, eo G C^, K-y = J^ip'^{u) x 
w-'y{du) < 00 and 

p+i „ 

(33) / \fPHv)\Wdv)<^- 

Then uJi{i) = 0{\ai\'^ /^), < / < p, where q' = min(2, q), and (27) holds un- 
der (31). 
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The condition < co controls the tails of tp. Both Propositions 3 and 
4 allow dependent and heavy-tailed errors. For the linear model (1) with 
heavy-tailed errors, it is more desirable to apply the M-estimation technique 
to estimate the unknown f3 since the least squares procedure may result in 
estimators with erratic behavior. A popular model for such heavy-tailed 
processes is the moving average process Cj = Y^JLodjEi-j, where Ei are i.i.d. 
random variables with stable distributions and are coefficients such that 
ei is well defined. Recently there has been a substantial interest in linear 
processes with heavy-tailed innovations; see [28, 54] and [61] among others. 
Davis, Knight and Liu [14] studied the behavior of the M-estimator in causal 
autoregressive models, while Davis and Wu [15] considered M-estimation in 
linear models. In the latter two papers the errors are assumed to be heavy- 
tailed, however, independent. 

Example 3. Let r > be an integer and i G (1, 2). Assume that the den- 
sity fe £ C"^ satisfies fe^\t) ~ as t — > ±00, where /i is a slowly 
varying function [31], namely lim^^^^oo h{xX) /h{x) = 1 for all A > 0. By Kara- 
mata's theorem [31], simple calculations show that there exist real constants 

(i) '1 

Cj, < j < T, such that fe (t) ~ Cj\t\^^^ h{\t\) as t — > 00, and, for some 
constant C, 1 - F^it) ~ C\t\-'h{\t\) and Fs{-t) ~ C\t\-''h{\t\) as t ^ cx). So 
Ei is in the domain of attraction of a stable distribution with index l [31]. Let 

j<l + 2i. Then J^[f^^\t)]'^w^{dt) < 00, < j < r. Hence (33) holds. As an 
interesting special case, let Ei be i.i.d. standard symmetric-a-stable (SaS) 
random variables with index l € (1,2). By Theorem 2.4.2 in [31], the den- 
sity feit) ~ cJt|~-^~' as |t| 00, where = 7r~^^(l -|- t) sin(i7r/2). A similar 

argument shows that fe^\t) ~ Ci,^r\t\"^^'^~'' , where c^^r = C(,nr=i(~^ ~ '-)• 

Example 4. If have finite variance, then (31) with q' /2 = 1 implies 
that the covariances are absolutely summable. It seems that the robust es- 
timation problem of (1) with SRD linear process errors has been rarely 
studied in the literature. If (31) is barely violated, for example, if a„ = n~'^, 
1/2 < /i < 1, then the errors are long-range dependent (LRD). In the LRD 
case, the M-estimates behave very differently from those in the i.i.d. or 
weakly dependent error cases in that they are asymptotically first-order 
equivalent, in probability, to the least squares estimate [37]. 

Remark 5. The condition (31) seems almost necessary for the asymp- 
totic normality of On- Let Ei be i.i.d. standard SaS random variables with 
index l £ (1,2) and a„ ~ n~^, n G N. Then (31) is reduced to l/j. > 2. Sur- 
gailis [54] showed that, if tfj, <2, then the empirical process of satisfies 
a non-central limit theorem and the normalizing sequence is no longer y/n. 
The asymptotic distribution of our estimate On is unknown when t/i < 2. 
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3.2. Nonlinear time series. Many nonlinear time series models have the 
form 

(34) ei = R{ei-i,ei), 

where i? is a measurable function and £i are i.i.d. innovations. Diaconis and 
Freedman [16] showed that (34) has a stationary solution if for some q> 
and to 5 

E(log4o)<0 and 4o + l^(io, ^o)! G 

(35) 

where 4o = ^o) " ^ ^o)l . 

In this case iterations of (34) lead to (3). Due to the Markovian structure 
of (cj), we can let Fi{u\ei) = Fi{u\J^i) and /i(n|ej) = fi{u\Ti) be the con- 
ditional distribution and density functions. Then Fi{u\v) = F[R{v,ei) < u] 
and fi{u\v) = dFi{u\v)/du. 



Proposition 5. Assume that there exists a constant Cq <oo such that 
(36) sup 



dFl^\u\v) 



dv 



+ sup\Fi^\u\v)\<Co, 1 = 0,..., p. 



Then under (35) we have iOi{i) = 0(x*) for some x £ (0, 1) and hence (28) 
holds. 

Proof. Let (eQiez be an i.i.d. copy of (ei)igz and, for i > 0, e* = 
G(...,e_i,eo,ei,...,ei) and e- = G(. . . ,e'„i, e'g, ei, . . . ,ei). By Theorem 2 
in [65], under (35) there exist > and q G (0,1) such that ||e^ — ei\\^ = 
0{q^). So ||e* - e^H^ = 0(||e^ - ei||, + ||e^ - e*||J = Oio"). Observe that 

F'^\u\Tn-i)=Ff\u\en-i). As in (32), by (36), 

||f(')(u|^._i) - F^\u\K-i)\\ = \\F?{u\en-i) - Fi\u\e:_,)\\ 

< ||min(2Co,Co|e„_i - e*_i|)|| 

<2Co\\\en-i-el_,r-^'''/^^\\=0{xn, 

where x = q'^'''''^''^^^- □ 

Example 5. Consider the autoregressive conditional heteroscedasticity 
(ARCH) model 



where are i.i.d. innovations and a, b are real parameters such that 
(37) E(log |6eo|) < and eo G 
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for some q> 0. Then i^^ = \beo\ and (35) holds. Note that (37) imposes very 
mild moment conditions and it even allows |ej| to have infinite mean. Let 
(resp. fe) be the distribution (resp. density) function of Eq. Assume that 
fs satisfies (30). Since Fi{u\v) = F^{ul\/ a? + b'^v'^), simple calculations show 
that (30) implies (36). As an interesting special case, let have the stan- 
dard Student t-distribution with degrees of freedom A: > 0. Then the density 
/^(i) = [A:/(fc + t2)](i+fc)/2/c7^^^ ^here Ck = k^^^B {1/2, k/2) and B{-,-) is the 
beta function. Clearly (30) and (35) are satisfied. In certain applications it 
is desirable to use ARCH models with Student-t innovations to allow heavy 
tails [56]. 

Proposition 6 below gives a sufficient condition for (27) for the process 

(38) ei = u{ei_i) + Ei, 

where is a Lipschitz continuous function such that the Lipschitz constant 

(39) 4:= sup ' \' ^1 ' <1 

a^b \a-o\ 

and E(|ej|") < oo for some a > 1. The condition < 1 implies that the 
nonlinear time series (38) has a unique stationary distribution. A promi- 
nent example of (38) is the threshold autoregressive process Cj+i = aief + 
«2(— ej)"*" + Ei+i, where ai,a2 are real coefficients [55]. In this example (39) 
is satisfied if max(|ai|, |a2|) < 1- If the process (34) is of the form (38), then 
condition (27) can be simplified. 

Proposition 6. Assume ^^^p"^ {t)w -^{dt) < oo for some 7 > 1. Further 
assume (39), Ei £ , 7 < g < 7 + 2 and that fe satisfies (33). Then there 
exists X G (0, 1) such that 

(40) u;i{i) = 0{x'), 0<l<p. 



Remark 6. In Proposition 6, Ei are allowed to have infinite variances. 
Proposition 6 follows from Theorem 2 in [65]. For a proof see [64]. 

4. Proofs of results in Section 2.2. 

Lemma 2. Let r„ = J2?=i V'(ei)zi,n- 
(i) Assume E,['ip{ei)] =0, ||V'(ei)|| < 00 and 

(41) J2 l|E[^(eOI-^o] - IE[V'(e:)|^o*]ll < 00. 

i=l 

Then ||T„|| = 0(1). 
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(ii) If in addition (9) and (A4) hold, then r„^iV(0,A). 

Proof, (i) For A; > let Jk = J2i=i'Pi-ki'{ei)zi^ri- Note that the sum- 
mands of are martingale differences. By the orthogonality, since J27=i ^i,n'Zi n 
Idp and A; > 0, 



(42) II Jfcf = J2 lln-fcV'(e^)z.,„f = J2 Kn\^\\'PoHek)f=p\\'PoHek) 



(45) lim — ' ,^ = c zAfc 

n— too (7^ 



i=l i=l 

By Lemma 1, ||PoV'(efe)|| < ||E[V^(efc)|.Fo] - E[V(e^)|.Fo*]||. By (41), 
X^fclo ll'^fcll < and consequently ||T„|| = 0(1) since r„ = Efclo«^fc- 

(ii) We now show r„ ^ A^(0, A). Let c be a p-dimensional column vector 
with |c| = 1 and 

^2,n — ^'Z'/,ri* By the Cramer— Wold device, it suffices to 

verify that 

n 

(43) 5]ni,„V(e^)^A^(0,c'Ac). 

1=1 

Since ELi z.,nz^,„ = Idp, d^ := (^^=1 < J^/' = 1- By (A4), 

(44) lim =0. 
By (9), for each A; > 0, 

Si=l ''J'i,nUi+k,n /A 
^ = C AfcC. 

n 

Write ^(ej) = J2jtoCtjr]i,i-j, where aj = ||Pj_jV'(ei)ll and 7/j,j_j = 
Vi-jip{ei)/aj . Then (41) entails X^j^o'^j -^y argument in the proof 
of Theorem l(i) in [25], (44) and (45) imply (43). (Theorem l(i) in [25] is 
not yet directly applicable: condition (5, a) therein requires — > oo and con- 
dition (7) therein requires X^fcez^'^fcC > 0. However, a careful examination 
of Hannan's [25] proof indicates that his conditions (5, a) and (7) are not 
needed in deriving (43) from (44) and (45). Also note that there is a typo 
in [25], (5,b). The correct version should be of the form (44).) □ 

Lemma 3. Let {ani)i=i, n G N, be a triangular array of real numbers 
such that J27=i'^ni — 1 '^"'^ := niaxj<„ la^il ^0. Assume (Al), (A3) 
and (7). Let r]n,i = p{ei - a„i) - p(ei) + aniip{ei). Then var(X;"=i ??n,i) ^ 0. 

Proof. Let / G N and < /i < eq. By (A3), (7) and Lemma 1, we have 

oo / 

^ sup \\VQ[(l}{ek) - i){ek - e)]\\ < ^ sup llV'(efc) - V'(efc - e)|| 
fe=ol'=l<A' fc=ol^l<M 

OO 

+ 2sup ||PoV'(efc-e)|HO 
k=i+i kl<M 
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by first letting fi ^0 and then / — > co. Let An = J2?=i (^ni 

n 

Zk,n(t) = ^Vi-kbP{ei) -ip{ei- anit)]ani, 0<t<l, k>0. 

i=l 

Note tliat the summands of ^fc(t) are martingale differences. Since zun 0, 



(47) 



sup ||Zfc.„ 

0<i<l 



sup ^aliWVi^kbPiei) - ^p{ei - Unit) 



0<*<li=l 



< An sup WVoiipiek) - ipiek 



holds for large n. Since r]n,i = /o^[V'(ei) — ~ tom)]^^™ < 1, 



1=1 



oo 1 
J2 / Zk,n{t)dt 



oo .1 



<E 

fc=0 



fc,n 



dt, 



which by (46) and (47) converges to as n — > oo and /U | 0. □ 



Proposition 7. Under conditions of Theorem 1, we have for any c > 
that 



(48) Dn{c) := sup 

|e|<c 

m probability. 



i=\ 







Proof. We should use the argument in [7]. Let r]i{Q) = p{ei — ^0) — 
p[ei) +z^„^^(ei). For a fixed vector 6' with \Q\ < c, let a„j = z[^9. Then 

X;r=i «nj ^ and maxj<n |ani| < cr„ ^ 0. By Lemma 3, var[X;r=i ??i(^)] ^ 
0. Note that X^Li Zj,nZ • „ = Idp. By Lemma 1 in [7], under (Al) and (A2) 
the bias 



i=l 



E 

i=l 

^'(0) 



V^'(O) , 



= o[U2)]. 

So X]r=i'?i(^) ^ pointwise. Since r]i{9), I < i < n, are convex 

functions of 9, by the convexity lemma in ([42], page 187), we have uniform 
in-probability convergence. (A nonstochastic version is given in [48], Theo- 
rem 10.8, page 90. A subsequencing argument leads to the in-probability- 
convergence version. See also Appendix II in [1] and [7] for more details.) 
□ 
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Proof of Theorem 1. The relation (8) easily follows from properties 
of convex functions; see, for example, the proofs of Theorems 2.2 and 2.4 
in [7] and Theorem 1 in [42]. We omit the details. A proof is given in [64]. 
That On = Op(l) follows easily from Lemma 2 and (8). If (9) holds, again by 
Lemma 2 we have the central limit theorem (10). □ 

Proof of Theorem 2. Write Kn = Mn + Nn, where 

n 

(49) Mn{e) = 5]{V(ei - ^[^J) - E[V^(e, - z^,„0)|.F,_i]}z,,„ 

1=1 

and, noting that ^[i>{ei - z-_„0)| J"i_i] = ?/'i(-z- „6'; J^j_i), 

n 

(50) Nn{e) = Y,{M-<,J;:f^.i) - (/.(-z^,„0)}z,,„. 

i=l 

The summands of M„ form (triangular array) martingale differences with 
respect to the filter a{J^i). Since Yll=i '^i,n'^'i n — Idp, we have n~^/^ = 0{rn) 
and by (16), n~'^ = O(r^) = 0[5„Cn(4)]. By Lemmas 4 and 5, (17) follows 
since n"^ = 0[(5„Cn(4)]. □ 

Lemma 4. Assume (A5) and (16). Then 

(51) sup \Mn{e) - M„(0)| = Ov[jTn{5n) log n + n-\ 
\e\<5n 

Proof. Since p = J27=i '^i.n^i.n < nr^, (16) implies that bn = o{^/n). It 
suffices to show that the left-hand side of (51) has the bound Oj>[gny/Tn{Sn) logn + 
n~^] for any positive sequence gn oo. Assume that Qn^'^ for all n. Let 

0n = 2g„A/r„(d„)logn, t„ = — , Un = t^, 

rii{e) = [il){ei - 7!^ - il^{ei)]'Li^n, r„, = max sup \-r]i{G)\, 

|e|<5„ 

n 

Un = ^^{[i^{ei + |zj,ri|5n) - 1p{ei - |zj,„|(5„)]^|j^j_l}|zi,nP- 
i=l 

Since ip is monotone, for 5 > 0, 

sup \rii{e)\ < |zi,„|max[|'0(ej - |zi,„|J) - ^p{ei)\,\^p{ei + |zi,„|(5) - V'(ej)|] 
\s\<s 

< \zi^n\[4'{ei + |zi,n|<5) - "0(61 - |zi,n|5)]- 
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So E[supie|<5|7?,(0)|^] < 2|z,,,|^[m^(-|zi,„|5) +m^(|z,,,|5)], E(ri) < 2r„(5„) 
and 

(52) P(r„ > t„) < t-'E(r2) < 2t-Vn(5n) = 0[(5-' log5n)'] ^ 0. 
Similarly, E(C/n) < 27:„((5„) and 

(53) P(C/„ > Un) < K'E{Un) = 0[{g~^ log <7n)'] ^ 0. 

Write Zi^n = (-2ii,n; • • • , -^ip^n)'- For notational simplicity we write Zij for 
Zij,n, 1 < j'< p. Let Hp = For i G N let D^{i) = (2 x l^^^>o - 

1, . . . , 2 X lzip>o - 1) £ rip- For d G Hp and l<j<p define 

n 

M„,,- d(^) = ^{^'(e. - z:,„0) - niPiei - z:^„0)|^,_i]}z,,l^^(,)=d- 

Since M„ = X]dGnp(-^n,i,d; ■ • ■ > -^n,p,d)'; it suffices to show that (51) holds 
with M„ therein replaced by Mnj^d for all d G Hp and 1 < j < p- To this 
end, for presentational clarity we consider j = 1 and d = (1, — 1, 1, 1, . . . , 1). 
The other cases similarly follow. 

Let \0\ < 6n, rii,j,di^) = ["^(ci - ^'^ ,^9) - il){ei)]zijlD^{i)=d and 

n 

S„(0)=5^E[r?„-d(e)l|,,,^,,(,)|>tJ^.-i]. 

i=l 



Then for large n, since u„ = o{tn(pn), 



n\Bnm>(t>n,Un<Un)< 



(54) 



i=l 



Since 'Pi['?ij,d(^)l|r;, j d(f)l<tn]' ^ = l;---,'^; form bounded martingale differ- 
ences, by Proposition 2.1 in [19] and (54), for \9\ < 5n, 

F[\Mnj,d{0) - M„j-d(0)| > 2cl)n,Tn < tn, Un < Un] 



< 



{e)\<t„ 



1=1 



> (t>n-,Tn < tn,Un < Ur. 



(55) +P 



i=l 



— 4'n 1 -^n ^ i Un ^ 



: 0[exp{-</.2 /(4t„,/)„ + 2un)}] + n\Bn{e)\ > 0n, Tn < t„, [/„ < n„ 
:0[exp{-,/.2/(4t„</.„ + 2n„)}]. 
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Let i = n^ and Ge = {{h/i, kp/i) :ki£Z,\ki\< n^}. Note that Ge has 
(2n^ + ly points. By (55), since f„(/)„logn = o((/)^) and n„logn = o((/)^), for 
any ^ > 1 we have 



(56) 



sup |M„.i,d(0) - M„,,i.d(0)| > 2(l)n,Tn<tn,Un < Ur, 



= 0(n9P)0[exp{-</>2 /(4t„</.„, + 2n„)}] = 0{n-'P), 
which by (52) and (53) imphes 



(57) 



hm P 



sup \Mn,l,d{0) - Af„,i,d(0)| > 2ct)r, 



0. 



For a G M let \a\i = \af\/l and [a\i = [al\/l. Write (a)£,i = [a\i> and 
(a)^^_i = [a]^. Let d = (di, . . . , dp) € Hp. For a vector 9 = {6i, . . . , Op)' let 
{G)e,d = {{0i)i,di^---ASp)i,dp)- For example, for d = (1, -1, 1, 1, . . . , 1), we 
have {9)i,d = {[9i\i,\02]i,mi,---,[Op\i) and (^)^,_d = ( f^il^ [^2]^, f^slf, 
\Op]i). Observe that for this d, r?i,i,d((6')^,-d) < ?/j,i,d(6') < ??ij,d((6')Ad) 
since ip is nondecreasing. 

Let |s|,|t| <r„5„. By (13), |E[^(ei - t) - V(e» - s)|.7^»_i]| < L,_i|s - i| for 
all large n since r„(^„ ^ 0. Let V„ = X^iLi ^i-i. Again by (13), F{Vn > n^) < 
n-^E(K) = 0(n-3). Since |0 - (0)f,d| = 0{l~^), we have maxi<„ |z^ „(e - 
(^)Ad)|=o(^-'), andby (16), 

n 

sup 5^ |E[77,((^)^,d) - V^m^^-l]\ < Cr'Vn- 

\e\<Sni=i 
Therefore, for all \9\ <6n, 

Mn,i,dmi,-d) - M„,i,d(0) - GVje 
(58) <M„,i,d(^)-M„,i,d(0) 

< M„,i,d((^)^,d) - M„,i,d(0) + CVn/£, 
which implies (51) in view of (57) and V„/£ = op(n^/£) = op{n~^). □ 

Lemma 5. Assume that ((5„) satisfies (16) and (A6) holds with I = 
0,...,p. Then 



(59) 



sup \Nn{g)-Nnm 

\s\<Sn 



Proof. Let I = {ai, . . . , Og} C {1, . . . ,p} be a nonempty set and 1 < 
ai < • • • < Uq. For a p-dimensional vector u = {ui, . . . , Up) let u/ = (uilig/, . . . , 
Uplpe/). Note that the jth component of u/ is if j ^ I, 1 < j < p. Write 



duj 



duj 



9^1 



duo,^ ■ ■ ■ dua^ 



dUn 



dUa, 







dmn{uj) 




duj 
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. Observe that 

n 

0, by Lemma 1, for 



i=l 



Let |u| <p6n and /c G N. Since maxj<„ |zi^nu| <prn5., 
large n, 

n 2 



(9), 



i=l 



< J2 sup ||E[<^(6;^fc_i)|^o] - E[^p['\e■,:FU)\m ■ 

1=1 kl<<:o 

As in the proof of (i) of Lemma 2, if (A6) holds with / = 0,...,p, then 

1 /2 

||59Af„(u7)/5u7|| = 0[C„' (2 + 2q)] uniformly over |u| <p6n- Consequently, 



sup 

sl<<5„"'0 



duj 



< 



(60) 



< 



rSn 




J — 5n 


duj 


rSn 




J-Sn 


duj 



duj 



0[6iej\2 + 2q)]. 



By (16), 6l^/CJjT2q} = 0[6n^/CJ^]. So (59) follows from the identity 

rsi d\'\Nn{uj) 



(61) 



Nn{s)-NniO) 



■dui, 



E 

/C{l,...,p}' 

where the summation is over all the 2^ — 1 nonempty subsets of {1, . . . ,p}. 

□ 



Proof of Corollary 1. The sequence {6n) clearly satisfies (16). The- 
orem 1 implies that \6n\ = Op(l) = op{6n)- Note that Kn{0) = J2^=i '>P{^i)'^i,n 
and KniOn) = -Y.l=M-<Jn)^i,n + Ow{rn). By Theorem 2, (19) follows 
from 

n n 

5][v:>(-z^,A) + <„^n¥^'(0)]z,,„ = 5]0[|z^ J„|2]|zi,„,| = Op[C„(3)] 

i=l i=l 

in view of X^Li '^i,n'^i,n = Idp, Cn(2) =p, Cn(3) < r.„Cn(2) = 0(r„) and Cn(4) = 
O(r^). For (20), it suffices to show that the left-hand side has the bound 
Op [cn \/Cn(2 + 2A) log n + 

Cnfn] for any sequence Cn — > oo. The latter easily 
follows from (19) and m{t) = 0{\t\^). □ 
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4.1. Proof of Theorem 3. 

Lemma 6. Under the assumptions of Theorem 3, we have (i) 

(62) sup \KniP) - Kn{0)\ = 0^.,.{Ln + Bn) 

\l3\<bn 

and (ii) for any i > 0, -^n(O) = Oa.s. (/in); where hn = n^/^(log?i)^/^ x 
(loglogn)V2W4. 

Proof. As in the proof of Theorem 2, write Kn = Mn + Nn, where 

n 

1=1 

n 

^n(/3) = E{V'i(-x:/3;^i-i) - ^(-x^/3)}xi. 

i=l 

Since n~^/^ = o(i?fi), by Lemmas 7 and 8, (i) holds. For (ii), as with the ar- 

~ 1/2 

gument in (42), we have ||-ftr„(0)|| = 0(^n ) = 0{y/n). So the stated almost 
sure bound follows from the Borel-Cantelli lemma and (68) in view of the 
argument in (69). □ 

Lemma 7. Let {'Ki)i>i he a sequence of hounded positive numhers for 
which there exists a constant cq > 1 such that 

(63) max tTj < cq min VTj holds for all large n. 

n<i<2n n<i<2n 

Assume (A5) and fn = 0{^/n). Let Wd = 2covr2d and q > 3/2. Then as d^ 
oo, 



(64) sup max|M„(/3) - M„(0)| = 0^.s.[Jf24^d)d'' + 2~^''/\ 

l/3|<^d"<2'' 

Lemma 7 is proved in [64]. The argument is similar to the one used for 
Lemma 4. 

Lemma 8. Let (7rj)j>i be a positive sequence satisfying (63) and Tin = 
o[n~-^/^ (log n)^]; let Wd = 2cQiT2d . Assume (21) and assume (A6) holds with 
I = 0, . . . ,p. Then we have (i) 



sup |iV„(g)-A^„(0)| 

|g|<'rn 



0[VCn(4)^„] 



(65) 

and (ii) as d^ oo, for any i> 0, 

(66) max sup \Nnig) - Nnm^ = o,.,.[^24^)^ddH^ogd)^+'] 
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Proof. Let Q„j(/3) = ELi ^i(-x^/3;-F,_i)xi,-, l<j<p, and = 
Qn,j{P) - Qnj(O). Since 7r„ = o[n~^/'^ {log n)'^], by (21), vr^fn ^ 0. It is easily 
seen that the argument in the proof of Lemma 5 imphes that there exists a 
positive constant C < oo such that 



(67) 



E 



sup \SniP)-Sn'm^\<CY,wl'' 
\P\<^d J q=l i=n'+l 



2+2q 



holds uniformly over 1 <n' <n<2'^. So (65) holds. We now show (66). Let 
sup^ denote sup^^^^^^. By the maximal inequality (see [64]) 



max sup |5„(/?)| 

"<2'* \i3\<zu 



d 

<E 

i=0 



<2d—i 

Ei sup \S2^m{l^) - S2^(m-l){P)? \ 
m=l M/3|<^ >. 



(68) 

since i > and w'^J^^2'i{2 + 2q) = 0(ci7^^2<i(4)), (67) implies that 
„<2<i sup|^|<^ |S'„(/3)|jp _ ^ 



1/2 



(69) 



E 



max^ 



,2 -^z-v-y-^d"- ^=2 

By the Borel-Cantelli lemma, (66) holds in view of (63). □ 

Proof of Theorem 3. By Lemma 6, we have (i) and 
(70) sup \Kn{l3)\=Os„sXwn), wheie Wn = Ln + K + Bfi 

\l3\<bn 

and 6„ = n-i/2(iog „)3/2 (^^g ^)i/2+._ Let e„(/3) = ELi [/>(e^ - x^/?) - /9(ei)] 
and 

n „i 

A(/?) = -E / <^(-x:/3t)x^/3a!t. 
Using /9(ej) — /9(ej — x^/3) = /q^ ^'(ci — x^/3t)x^/?dt, we have by (70) that 



sup \Qn{P) - An{P)\ = sup 

W\<bn |/3|<bn 



Kn{f3t)fidt 



Oa..sXWnhn)- 



Let A* = 2-1 liminf„^oo A„,/n. By (21), f„5„ = o(l). Then e„(3)63 = 0(nf„)63 
o{nb'^). Since (/9(5) = 99(0) + '/^'(O)^ + 0(6'^), we have for all large n that 

inf A(/3) > |y''(0)nA*62. 

1/3 1 — 

Since m(t) = 0{Vi) as i ^ 0, L„ = 0[^y^(3)6y^(logn)9]. Let 3/2 < g < 7/4. 
Under the condition ^„(2) = 0(n), we have ^„(2 + k) < r^^„(2) = O(nf^), 
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K > 0. Elementary calculations show that, with (21), hn = o{nbn), = 
o{nbn) and Bn = o{nbn). Therefore, Wn = o(n6„) and consequently we have 

inf Qn{fi)> inf An{l3) - sup |G„(/?) - 

\P\=K \P\=bn |/3|<6n 

> y{{))nKhl + 0^,^,{wnhn) > \v'{0)nX,bl 
almost surely. By the convexity of the function @ni'), 

irif eM>y{0)nKbl\ = \jri{ en{/3)>y{0)nXX 

\m>bn ) i\P\=bn 

Therefore the minimizer /?„ satisfies <bn almost surely, 
(iii) Let \(3\ <bn- Since 6„f„ — > 0, by Taylor's expansion, 

n n 

- ^ <^(-x:/3)x, = ^[/(0)x^/3 + 0(|x^/?n]x, = ^'mnP + 0[en(3)62]. 

i=l i=l 

So (24) follows from (i) and (ii) in view of r2„(/3„) = Oa.s.(^n)- D 

Proof of Corollary 2. Clearly the condition sup„ /i(n|:ro) < Co < 
oo implies (A5). The other two conditions ip{t) =tip'{0) + 0(t^) and m(t) = 
0{\/t) as t — > easily follow. Then we have (Al)-(A3) and (A6). By Theorem 
3, it remains to show that r2„(/3„) = Oa.s.('^n)- Observe that =ol — 1x<o 
and pa{x + 5) - Pa{x) = 6ip{x) + 5+la;=o- Let QniP) = J2i=i[Paiei -y^'il^) - 
Pai^i)] and V = 0„(/3„). Then 

^'m ~, = Z^(-Xiv)^(e, - Xi/3„) + 2^(-x,v) + l^^^^,^^ 



-(^^n(/3n))'v + E(-^.v) + l^^_,^^. 



i=l 

Since 6„(/3„ + ev) > 6n(/3„) and (-x.v)+ < |x-||v|, we have \^n{(in)\ < 
X^iLi l^il^e =x'/3 ■ Lemma 9 and Schwarz's inequality, |0„(/3„)| < (p + 
l)fn almost surely. □ 

Lemma 9. Assume that sup^ |/i(ti|.7-o)| < Cq almost surely for some con- 
stant Cq. Then 

n 

(71) supy^ le-=x'/3|xj| <(p + l)fn almost surely. 

P i=l 
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Proof. It suffices to sliow P(ej^ = . . . , e^^.^^ = for some (3) = 

for all 1 < f 1 < • • • < ip+i. To this end, the argument in [4] is useful. 
Clearly we can find ui, . . . , Up+i with + • • • + Up^i / such that uiXjj + 
• • • + itp+iXjp_^^ = 0. Without loss of generality let Up+i = 1 and write r] = 
Ej=i UjCi. . Then P(eip_^i = = almost surely. So P(uiej^ + t/ = 

0) = 0, which completes the proof. □ 
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